Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure: Remove AKS vmType #6186

Merged

Conversation

jackfrancis
Copy link
Contributor

@jackfrancis jackfrancis commented Oct 10, 2023

What type of PR is this?

/kind deprecation
/kind documentation
/kind cleanup

What this PR does / why we need it:

This PR removes the obsolete, non-working implementation for manual AKS cluster integration as expressed via the "aks" vmType configuration.

Node autoscaling has long been a first class GA feature of AKS. See:

Running the Azure cluster-autoscaler provider manually on your cluster is not a supported feature. To aid future maintainability of the project, we're removing these non-working code paths and updating documentation.

Which issue(s) this PR fixes:

Fixes #6190

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Azure: Remove AKS vmType

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/deprecation Categorizes issue or PR as related to a feature/enhancement marked for deprecation. kind/documentation Categorizes issue or PR as related to documentation. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 10, 2023
@k8s-ci-robot k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider area/cluster-autoscaler size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 10, 2023
@jackfrancis
Copy link
Contributor Author

/assign @gandhipr

@jackfrancis
Copy link
Contributor Author

jackfrancis commented Oct 10, 2023

Validated "vmss" vmType on AKS cluster:

$ git status
On branch azure-aks-vmtype-deprecate
Your branch is up to date with 'origin/azure-aks-vmtype-deprecate'.

nothing to commit, working tree clean
$ TAG=$(git rev-parse --short HEAD) make build
rm -f cluster-autoscaler-amd64
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o cluster-autoscaler-amd64 --ldflags "-s"
$ REGISTRY=docker.io/jackfrancis TAG=$(git rev-parse --short HEAD) make make-image
docker build --pull \
		-t docker.io/jackfrancis/cluster-autoscaler-amd64:00f9286d1 \
		-f Dockerfile.amd64 .
[+] Building 3.4s (7/7) FINISHED                                                                                                                                                                   docker:desktop-linux
 => [internal] load .dockerignore                                                                                                                                                                                  0.0s
 => => transferring context: 2B                                                                                                                                                                                    0.0s
 => [internal] load build definition from Dockerfile.amd64                                                                                                                                                         0.0s
 => => transferring dockerfile: 866B                                                                                                                                                                               0.0s
 => [internal] load metadata for gcr.io/distroless/static:nonroot-amd64                                                                                                                                            0.4s
 => [internal] load build context                                                                                                                                                                                  1.7s
 => => transferring context: 143.34MB                                                                                                                                                                              1.7s
 => CACHED [1/3] FROM gcr.io/distroless/static:nonroot-amd64@sha256:5f1b35fe0293eebf6e574ee21d43a0ad55beb99e7384427216f5f18eaa9d5077                                                                               0.0s
 => [2/3] COPY cluster-autoscaler-amd64 /cluster-autoscaler                                                                                                                                                        0.7s
 => exporting to image                                                                                                                                                                                             0.6s
 => => exporting layers                                                                                                                                                                                            0.6s
 => => writing image sha256:d0ddf4ffba96368c5748c7669c21e256e1855148ec0ba8ef66125d5f3ad7b957                                                                                                                       0.0s
 => => naming to docker.io/jackfrancis/cluster-autoscaler-amd64:00f9286d1  

What's Next?
  View a summary of image vulnerabilities and recommendations → docker scout quickview
Image 00f9286d1-amd64 completed
$ REGISTRY=docker.io/jackfrancis TAG=$(git rev-parse --short HEAD) make push-image
./push_image.sh docker.io/jackfrancis/cluster-autoscaler-amd64:00f9286d1
About to push image docker.io/jackfrancis/cluster-autoscaler-amd64:00f9286d1
Are you sure? [y/N] y
Error response from daemon: manifest for jackfrancis/cluster-autoscaler-amd64:00f9286d1 not found: manifest unknown: manifest unknown
The push refers to repository [docker.io/jackfrancis/cluster-autoscaler-amd64]
e918121e441b: Pushed 
4cb10dd2545b: Layer already exists 
d2d7ec0f6756: Layer already exists 
1a73b54f556b: Layer already exists 
e624a5370eca: Layer already exists 
d52f02c6501c: Layer already exists 
ff5700ec5418: Layer already exists 
7bea6b893187: Layer already exists 
6fbdf253bbc2: Layer already exists 
54ad2ec71039: Layer already exists 
00f9286d1: digest: sha256:f89fac8e2c68f579cd6577d5e74ebb459fcb077673f370909b5fdb7dac575126 size: 2402
$ helm install cluster-autoscaler cluster-autoscaler --repo https://kubernetes.github.io/autoscaler --set cloudProvider=azure --set autoscalingGroups[0].name=aks-nodepool2-26139306-vmss,autoscalingGroups[0].maxSize=100,autoscalingGroups[0].minSize=0 --set azureClientID=$TEST_AZURE_SP_ID --set azureClientSecret=$TEST_AZURE_SP_PW --set azureSubscriptionID=$TEST_AZURE_SUB_ID --set azureTenantID=$TEST_AZURE_TENANT_ID --set azureResourceGroup=MC_francis-aks-test_francis-aks-test_eastus --set azureVMType=vmss --set image.repository=docker.io/jackfrancis/cluster-autoscaler-amd64 --set image.tag=00f9286d1
NAME: cluster-autoscaler
LAST DEPLOYED: Wed Oct 11 15:20:32 2023
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
To verify that cluster-autoscaler has started, run:

  kubectl --namespace=default get pods -l "app.kubernetes.io/name=azure-cluster-autoscaler,app.kubernetes.io/instance=cluster-autoscaler"
$ k apply -f https://k8s.io/examples/application/php-apache.yaml
deployment.apps/php-apache created
service/php-apache created

Validated scale in to zero:

$ k scale deployment/php-apache --replicas=0
deployment.apps/php-apache scaled
$ k get nodes -o wide | grep aks-nodepool2 | wc -l && sleep 600 && k get nodes -o wide | grep aks-nodepool2 | wc -l
       4
       0

Validated scale out from zero:

$ k scale deployment/php-apache --replicas=1000
deployment.apps/php-apache scaled
$ k get nodes -o wide | grep aks-nodepool2 | wc -l && sleep 600 && k get nodes -o wide | grep aks-nodepool2 | wc -l
       0
     100

@jackfrancis jackfrancis force-pushed the azure-aks-vmtype-deprecate branch from 5117ffb to 00f9286 Compare October 11, 2023 22:14
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. area/helm-charts and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 11, 2023
@jackfrancis
Copy link
Contributor Author

@gjtempleton am I doing the right or wrong thing updating the helm chart (and bumping the version) in this PR? Or would we prefer to separate the helm changes from the underlying code changes?

@jackfrancis
Copy link
Contributor Author

/hold

while I work on a possible fix

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 13, 2023
@jackfrancis jackfrancis force-pushed the azure-aks-vmtype-deprecate branch from 00f9286 to 51e401f Compare October 25, 2023 00:08
@jackfrancis
Copy link
Contributor Author

Current plan is to move forward w/ this.

cc @prachirp @tallaxes

I shared a PSA about this in SIG Autoscaling weekly meeting today. I'll wait a week or so to allow for public feedback, then I'll press for a final review and merge.

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 15, 2023
@jackfrancis jackfrancis force-pushed the azure-aks-vmtype-deprecate branch 2 times, most recently from d95b4e6 to 3def39e Compare November 15, 2023 19:51
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 15, 2023
@jackfrancis jackfrancis force-pushed the azure-aks-vmtype-deprecate branch from 3def39e to c151b5c Compare November 15, 2023 19:59
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 15, 2023
@jackfrancis jackfrancis force-pushed the azure-aks-vmtype-deprecate branch from c151b5c to af1a002 Compare November 21, 2023 17:57
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 21, 2023
@tallaxes
Copy link
Contributor

/approve

@jackfrancis
Copy link
Contributor Author

@gjtempleton is this PR ready to merge in terms of helm chart updates?

Chart changes + new minor version change are in this PR.

Thanks!

Signed-off-by: Jack Francis <[email protected]>
@jackfrancis jackfrancis force-pushed the azure-aks-vmtype-deprecate branch from af1a002 to 9e526ae Compare November 27, 2023 15:17
@jackfrancis
Copy link
Contributor Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 27, 2023
@tallaxes
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 27, 2023
@jackfrancis
Copy link
Contributor Author

/assign @gjtempleton

@jackfrancis
Copy link
Contributor Author

/approve

@Shubham82
Copy link
Contributor

@gjtempleton could you please approve this PR, so that it will merge?
Thanks!

@gjtempleton
Copy link
Member

Looks good to me from the chart side.

thanks!
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Bryce-Soghigian, gjtempleton, jackfrancis, tallaxes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 4, 2023
@k8s-ci-robot k8s-ci-robot merged commit dfd20f2 into kubernetes:master Dec 4, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler area/helm-charts area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/deprecation Categorizes issue or PR as related to a feature/enhancement marked for deprecation. kind/documentation Categorizes issue or PR as related to documentation. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Azure: "AKS" vmType no longer works
7 participants